Virtual Machine On VMware ESXi Hypervisor Will Stop Responding or Fail to Power On When Configured With the NVIDIA A40/A10 PCIe Graphics Accelerator As a "Passthrough" Device
search cancel

Virtual Machine On VMware ESXi Hypervisor Will Stop Responding or Fail to Power On When Configured With the NVIDIA A40/A10 PCIe Graphics Accelerator As a "Passthrough" Device

book

Article ID: 319788

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VM with GPU in passthrough mode fails to start on HPE server versions:

 

  • HPE Apollo 6500 Gen10 Plus System.
  • HPE ProLiant DL380 Gen10 server.
  • HPE ProLiant DL380 Gen10 Plus server.
  • HPE ProLiant DL385 Gen10 Plus server.
  • HPE ProLiant DL385 Gen10 server.
  • HPE ProLiant XL645d Gen10 Plus Configure-to-order Server.
  • HPE ProLiant XL675d Gen10 Plus Configure-to-order Server.



ESXi 7.0.3 update 3n fails to start a VM with a GPU connected to it via passthrough.

VM fails to start with an error: Module DevicePowerOn power on failed. Failed to start the virtual machine. Device 6:0.0 is not a passthrough device.

Also you could see hot-plug events during power on VM like below from vmkernel log.

 

Log File: /var/run/log/vmkernel.log

[YYYY-MM-DDTHH:MM:SS] cpu144:15901592)PCIPassthru: 3873: pcipDevInfo(0x43174d483300) allocated for 0000:a9:00.0
[YYYY-MM-DDTHH:MM:SS] cpu98:2097948)PCIEHP: 1564: 0000:a8:01.0: hotplug slot:0x1: num reads=1 slot status=0x108.
[YYYY-MM-DDTHH:MM:SS] cpu98:2097948)PCIEHP: 1496: 0000:a8:01.0: hotplug slot:0x1 (0000:a9:00.0) Adapter removed.
[YYYY-MM-DDTHH:MM:SS] cpu98:2097948)PCIEHP: 380: 0000:a8:01.0: hotplug slot:0x1: Setting PowerIndicator State BLINKING
[YYYY-MM-DDTHH:MM:SS] cpu98:2097948)PCIEHP: 1048: 0000:a8:01.0: Disabling hotplug slot:0x1
[YYYY-MM-DDTHH:MM:SS] cpu3:2097947)PCIEHP: 1477: 0000:a8:01.0: hotplug slot:0x1 (0000:a9:00.0) Adapter inserted.
[YYYY-MM-DDTHH:MM:SS] cpu3:2097947)PCIEHP: 380: 0000:a8:01.0: hotplug slot:0x1: Setting PowerIndicator State BLINKING
[YYYY-MM-DDTHH:MM:SS] cpu3:2097947)PCIEHP: 982: 0000:a8:01.0: Enabling hotplug slot:0x1
[YYYY-MM-DDTHH:MM:SS] cpu3:2097947)AMDIommu: 996: IOMMU 0000:a0:00.2: Prepared IOMMU for hotplug device 0000:a9:00.0
[YYYY-MM-DDTHH:MM:SS] cpu3:2097947)WARNING: PCIEHP: 641: 0000:a8:01.0: hotplug slot: 0x1: Device insertion detected while prior device 0000:a9:00.0 removal is still pending

 

Environment

VMware vSphere ESXi 8.X

Cause

Known issue with HPE: https://support.hpe.com/hpesc/public/docDisplay?docId=a00121002en_us

Resolution

This issue can be avoided by disabling the PCIe device hot-plug in the VMware ESXi host installed on the server:

 

1. On the bare metal ESXi host, enter the command:

  • esxcli system settings kernel set -s enablePCIEHotplug -v FALSE

2. Reboot the ESXi host.

 

3. Verify that PCIe device hot-plug is disabled by entering the command:

  • esxcli system settings kernel list -o enablePCIEHotplug

4. The entry, "FALSE," should be displayed under the Runtime column:

 

5. After changing this setting, the VMs will function properly when running the GPUs in VMware pass-through mode.

 

 

Additional Information

Impact/Risks:

VM power on fails